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?.3S TRACT 

This experiment was designed to assess the ability of 
item writers to construct truly parallel tests based on a 
"duplicate-construction experinent" in which Cronbach argues that if 
the universe description and sampling are ideally refined, the two 
independently constructed tests will be entirely equivalent, and that 
within the limits of item sampling error any person would receive the 
same score on both tests. Two item writing committpes developed forty 
item driver's license examinations based solely on the material in 
the driver's manual. The two independently developed tests were 
administered to 117 high school students who tock both forms three to 
five days apart. The two forms were not equivalent according to 
Cronbach's criterion. As Cronbach suggests, inspection of individual 
items designed to measure the same general area, but worded 
differently, revealed some marked differences in item difficulty. His 
suggestion that the standard error of measurement be estimated fro^j 
split-half reliabilities seemed unwarranted. The author states th'',t 
perhaps on tests of very heterogeneous content domains test-reter t 
coefficients would be more appropriate. (RC) 
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Many educators , realizing the problems and limitations of norm 
based interpretations of test scores, have becam increasingly interested 
in assessing st;adents with respect to absolute performanoe standards. 
But such absolute stcindards are rarely as absolute as they appear. 
vJhile the interpretation of a student's perfonranoe level may be independent 
of the performance of other students, it may be very dependent on who 
happened to write the test questions. Unless the content universe is very 
precisely defined, different item writers could oonstruct tests on which 
the same st\xient would receive quite different scores, making absolute 
interpretations of the scores meaningless. Cronbach (1971) has suggested 
an experimental method for assessing the adequacy of a content universe 
definition whida he labels a "diplicate-construction e^q^eriment." In such 
an experiiT^t two oonpletely independent groups of item writers, given the 
same definition of a particular domain of tasks (or universe) , and the same 
passing standard (in terms of percent correct) , write tests of a prespecif ied 
nunfcer of items. Cronbach argues that if the universe description and sanpling 
are ideally refined, the two independently constructed tests will be entirely 
equivalent, and that within the limits of itGca sampling error any person 
would receive t±ie same score on both tests. To be more precise, Cronbach 
suggests that the nean of the squared differences between scores of both 
tests should not exceed the sum of the squared standard errors of measurement 
of the two tests, where the standcuxi errors could be derived from split-half 
analyses of the two tests. The current experiment was designed to assess 
the ability of item writers to construct sudi truly parallel tests, and to 
identify any practical problems in using Cronbach 's model. 



Metho d; In his hyi'Otl)Otical diycusir.ion Cronbach uucd a test ):)risr!d on 
"knowlcci.jG of the State Motor Vehicle Code 'I,. A similar task Wvis u.^.od 
in tlie current experiment ainue it allowj.; for a fairly precise universe 
definition (forty- two pages of the state published Driver *s Man u al of 
Vircjj nig ) , and wa^ a topic with w])ich all the item writers, as licensed 
drivers, were faniiliar* The two item writing coirnnittees consisted of 
about twelve members each from an introductory graduate tests and measure- 
ments course. They were instructed to develop forty item driver's license 
exams bused solely on material in the manual. They wore asked to suppose 
that the state had established 7[')'l correct ansv;ers as the minimal 
competency standard. The two independently developed tests were then 
administered to 117 driver education students from three rural high 
schools. Half of the students took foj"m A first and half of them took 
form B first followed three to five days later by the other form, 
RgsuI ts and Im ;.> lica 1 1 ons ; The two forms wore not equivalent according 
to Cronbach's criterion (sum of (Xj^-Xj^) --/N-B? , 00; and the sum of the 
squared standard errors = 16,60). The correlation between the two forms 
was ,60/ indicating some considerable changes in relative position from 
one form to the other. The mean score on form A was 22.4 (S,D.=4.4) 
while on form B it was 26.8 (S.D,=-4,0), The lack of direct comparability 
suggests a major difficulty of making absolute interpretations of test 
scores; statements that a student has mastered 75*^ of the relevant infor- 
mation because he correctly answered 75% of the test items make little 
sense if on another test of the same information, but constructed by a 
different group, the student gets GOl correct. 

As Cronbach suggests it should, inspection of individual items 
apparently designed to measure the some general area, but worded 
differently, revealed some marked differences in item difficulty. 



For exanple, bot±i test forms have questions based on a table of maximum 
speed limits in the Manual * On one form 74% of the students correctly 
identified the speed limit on interstate highways as 70 m.p.h., but on 
the other form only 31% recognized that the speed limit on limited access 
highways was also 70 m.p.h. Only a very precise domain definiticxi would 
be likely to differentiate between knowledge of these two speed limits. 

Cronbach's suggestion that the standard error of measurement be 
estimated from split-half reliabilities seems unwarranted considering 
his statesnent that "nothing in the logic of content validation reqioires 
that the universe or the test be homogeneous in content" (1971, p. 457) , 
and his further statements that high item intercorrelations have nothing 
to do with content validity. Perhaps on tests of very heterogeneous 
content domciins test-retest coefficients would be more appropriate. 
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